class: center, middle, inverse, title-slide # Lecture 7 ## Paired Samples ### Psych 10 C ### University of California, Irvine ### 04/13/2022 --- ## Paired samples - Until now we have been working with an experimental design known as **between subjects** design. -- - This means that each participant in our sample can only be included in one group at a time. -- - For example, in the smokers data, participant can be either on the smokers group or on the non-smokers group but they can't be in both. -- - We call this a **between subjects** design because we want to compare two groups that are made out of the responses of different participants. -- - However, we could have a design where we measure a participant in both groups. --- ## Example - **Problem:** We are at the end of the semester at a university and we want to know if our students have improved their grades. -- - There is variability in grades but we would like to make a generalization, and say that either they improved during the semester or that they didn't (as a group). -- - To solve this problem we have access to the students' scores on the midterm and final exams. -- - Notice that the observations between the groups will be correlated. This is a problem for our approach, as we stated in our model that the observations that we had where independent! -- - The correlation in our observations comes from the fact that we are measuring the same participant twice. Student's 1 midterm and final score will have some degree of correlation because they come from the same person. -- - In order to avoid the problem of correlated observations in this sort of context (observations are correlated because they come from the same participant) we need to get rid of the "redundant" information. --- ## Example - A simple way to get rid of the correlation is by just taking the difference between the two scores (this will only work when the correlation comes from us having two observations of the same participant). -- - In this case by taking the difference between midterm and final scores we are trying to remove some amount of redundant information. -- - This also means that we have to make some changes to our models. --- class: inverse, center, middle # Null model --- ## Null model - To define our models we now need to add one step, the difference between our observations. Remember that this time we are interested in a model about the differences between scores. -- - We denote the *i-th* observation of the midterm score as `\(y_{i1}\)` and the first observation of our final score as `\(y_{i2}\)`, then we define the difference between final and midterm as: `$$d_{i} = y_{i2} - y_{i1}$$` -- - Notice that now we only have one indicator for the difference, in other words, we only have a difference in score for each participant (there are no groups). -- - The null model assumes that there are no differences between midterm and final scores: `$$d_i \sim \text{Normal}(0,\sigma_0^2)$$` -- - Notice that we still assume that there will be variability in our observations `\((\sigma_0^2 > 0)\)`, however, we expect the difference between final and midterm scores to be around `\(0\)`. --- class: inverse, center, middle # Effects model --- ## Effects model - Using the same notation, the Effects model will formalize our assumption that there will be a difference between final and midterm scores. We express this model as: `$$d_i \sim \text{Normal}(\mu,\sigma_e^2)$$` -- - This means that we expect our observations to follow a normal distribution centered around some value `\(\mu\)`. With a variance of `\(\sigma_e^2\)`. -- - As in the previous examples, we have two models that we want to use in order to answer our research question. -- - Do students scores improve between the midterm and final exams? --- class: inverse, center, middle # Data analysis --- ## Variables and data visualization - First, let's start by visualizing our data of the midterm and final scores, using a histogram. -- - Use the file "exams-example.Rmd" on the examples directory on canvas to write your answer. --- ## Removing dependency between observations - Now create a new variable (question 2 of the "exams-example.Rmd" file) using the difference between final scores and midterm. -- - Make a box plot using the new difference variable (question 3). --- ## Models' predictions - Add the predictions of the Null and Effects model to the exams data (question 4). -- - Using the predictions of each model add a new variable with the squared error of each observation (question 5). -- - Calculate the SSE and mean Squared Error using the last two variables you added (question 6). --- ## Model Evaluation - Now we want to evaluate our models, we have to start by calculating the proportion of error (variance) accounted for by the Effects model `\((R^2)\)` (question 7). -- - What proportion of the error (variance) is accounted by the Effects model? (question 8) -- - What are the BIC values of the Null and Effects models? (question 9) --- ## Conclusion - Now we need to draw a conclusion and interpret it based on our original problem. -- - Which model would you choose based on the BIC values calculated on part 9? -- - What does our choice tell us about the improvement of the class from the midterm to the final?